Extreme Value Statistics and Traveling Fronts: Various Applications 
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An intriguing connection between extreme value statistics and traveling fronts has been found re- 
cently in a number of diverse problems. In this short review we outline a few such problems and 
consider their various applications. 
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Independence of random events is the most desirable 
property in probability theory and statistical physics. If 
we have a collection of independent random variables 
Xi, . . . , Xn with finite variance, then the distribution of 
the sum "Y^ Xj / \/N is Gaussian in the thermodynamic 
limit N ^ oo. Similarly under broad circumstances, 
the asymptotic distribution of the extreme values, e.g., 
^min = min(Xi, . . . ,Xn), belongs to one of just three 
possible famihes 

However, independence is the exception rather than 
the rule - random variables are often highly correlated. 
Little is known on extreme value statistics of correlated 
random variables yet a vast number of problems can be 
recast into such scheme. The celebrated example is the 
traveling salesman problem, that is to find the shortest 
closed tour visiting every 'city' once. There are (A^— 1)!/2 
possible tours and the lengths of the tours are obviously 
correlated. This and a few other combinatorial optimiza- 
tion problems were recently analyzed by using techniques 
originally developed to study spin glasses |^ . A nice gen- 
eral review of the recent progress in that direction is given 
by Martin, Monasson and Zeccina ||^ ; for an outstanding 
progress in one specific problem, the matching problem, 
see H. 



In a series of recent publications |7|-]12|] we have shown 
that there is an intriguing connection between the statis- 
tics of extreme values arising in various contexts and 
traveling fronts. More precisely, the cumulative distri- 
butions of extreme variables were shown to admit a trav- 
eling front solution. Such a connection was also noted in 
the context of a particle moving in a random potential 
p3[ . The goal of this short review is to convince the read- 
ers that the techniques of traveling fronts is a powerful 
tool to tackle the extreme value statistics of correlated 
random variables. By casting the problem in the trav- 
eling front framework, one easily determines two leading 
terms in the asymptotic expansion of the average value 
of the extreme variable. Furthermore, the variance of the 
extreme variable is nothing but the width of the traveling 
wave front and therefore it is usually finite. These results 
are very natural in the traveling wave framework yet very 
difficult to guess and derive using other methods. 

Traveling front solutions have been found in numerous 
problems (lil. To keep the discussion short, we consider 



the most well known example - the one-dimensional KPP 
(Kolmogorov, Petrovsky, Piskunov) equation also 
known as the Fisher equation [p^ . This is a nonlinear 
partial differential equation 
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dt 
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where (/)(a;, t) represents, for example, the density of a 
population at a point x at time t. Clearly this equation 
has two fixed points or stationary solutions: (i) (^{x) = 1 
for all X and (ii) (pix) = for all x. A simple linear sta- 
bility analysis shows that the solution (i) is stable while 
the solution (ii) is unstable. Therefore, if one starts with 
a sufficiently sharp initial condition, say 4){x, t = 0) — 1 
for X < and (j){x, t = 0) = for cc > it is easy to 
see (for example by numerical simulation) that as time 
proceeds, the front separating the stable solution 0=1 
and the unstable solution <^ = advances in the forward 
direction with a unique velocity Vf. Besides, the front 
retains its shape in the sense that the width of the front 
remains finite even at large times. The front velocity Vf 
is determined by analyzing the tail region x oo. In 
this region, is small and one can ignore the nonlinear 
term 0^ in Eq. (|l]). The resulting linear equation allows 
a spectrum of decaying solutions (j){x, t) cx e 
provided v(\) satisfies the dispersion relation 
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Thus a whole family of solutions parametrized by A is in 
principle allowed. However, the front actually advances 
with a unique velocity Vf. Thus there must be a selec- 
tion principle to choose the right velocity from the whole 
spectrum u(A). Note that the dispersion spectrum (|^) 
has a unique minimum at A = A* = 1 where w(A*) = 2. 
It was shown ||l5|,|l^ that for sufficiently steep initial con- 
ditions, the extrcmum of the dispersion curve is selected 
by the front, i.e., Vf — v{\*) ~ 2. Note that while the 
spectrum is determined solely by the linearized equation, 
for a given initial condition the nonlinear term plays a 
crucial role in selecting the final velocity from the full 
spectrum allowed by the linear equation. Subsequently 
it was shown ]l7|-JT9t that the front position Xf{t), apart 
from the leading v{X*)t term, has a slow logarithmic cor- 
rection 
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Although this velocity selection principle was originally 
proved only for the KPP equation, this strategy of se- 
lecting the extremum of the dispersion spectrum of the 
linearized equation was subsequently shown to apply to 
various traveling front solutions provided certain condi- 
tions are satisfied [Q. We will show how this selection 
principle can be successfully used to derive exact asymp- 
totic results for the statistics of extreme variables in a 
number of problems. In all the problems discussed below, 
we will find a traveling front solution of the same generic 
form (with a leading 'linear' term followed by a sublead- 
ing 'logarithmic' correction) as in Eq. (^). While the the 
velocity dispersion spectrum v{X) will be widely differ- 
ent from problem to problem, the principle of selecting 
the extremum of the spectrum, namely ^(A*), will still 
be valid. Finally, while A* and v{X*) are thus nonuniver- 
sal, the prefactor 3/2 in the logarithmic correction term 
turns out to be universal and is just the first excited state 
energy of a quantum harmonic oscillator . For a 

short derivation of this correction term, see e.g. appendix 
A of Ref. 

Directed Polymer on a Cayley Tree: As a first ex- 
ample, consider the problem of directed polymer on a 
Cayley tree studied by Derrida and Spohn |20| and re- 
cently resurfaced in a number of apparently unrelated 
problems |l^ , |2l| -p3|| . The primary emphasis of this work 
was on the spin glass like transition occurring at finite 
temperature and on fluctuation properties. In contrast, 
we consider exactly zero temperature and focus on the 
basic macroscopic quantity, namely the ground state en- 
ergy 10]. 

We consider a tree rooted at O (see Fig. 1) where a 
random energy e is associated with every bond of the 
tree. The variables e's are independent and each drawn 
from the same distribution p{e). A directed polymer of 
size n goes down from the root O to any of the 2" nodes 
at the level n. There are iV = 2" possible paths for the 
polymer of size n and the energy of any of these paths is 



path — < 
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The set of iV = 2" variables Ei, E2, . . ., En are clearly 
correlated in a hierarchical (i.e., ultrametric) way and 
the two point correlation between the energies of any two 
paths is proportional to the number of bonds they share. 
The ground state energy E^in{n) = min[i?i, E2, ■ ■ ■ , i?2"] 
is then a random variable and we are interested in its 
statistics. 




FIG. 1. The directed polymer on a Cayley tree. The e's 
denote the bond energies. 



The suitable quantity that has the traveling front solu- 
tion is the cumulative distribution P„(x) = Prob[£'inin > 
x\ . It satisfies [p| a nonlinear recursion relation, 



dep(e)Pn{x 
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and the initial condition Pq{x) — 9{—x), where 9{x) is 
the Heaviside step function. The front velocity can be de- 
termined by following the 'linearizing' strategy outlined 
above for the KPP equation. Unlike the KPP equation, 
however, we now need to linearize the recursion equation 
(^) in the 'backward' tail region x —00 By insert- 
ing 1 - Pnix) cx e-^[^-''(^)"l into Eq. §), one finds §] a 
dispersion spectrum 
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For generic distributions p{e), this spectrum has a unique 
maximum at A = A* and by the general velocity selection 
principle the maximum velocity v{\*) will be selected by 
the front. We now illustrate it for two distributions. 

(i) The bimodal distribution, p(e) = p(5(e— 1) + (1— p)(5(e), 
where < p < 1. In this case one can also think of the 
energy of a bond as a 'length' variable and a bond is 
present with probability p and absent with probability 
(1 — p). By analysing Eqs. ^ and (||) it was found ||] 
that the polymer undergoes a 'depinning' transition at 
the critical value Pc = 1/2. For p > 1/2, the ground 
state energy is 'extensive' and increases linearly with n. 
On the other hand, for p < 1/2 the polymer is 'localized' 
and the ground state energy remains finite even in the 
n 00 limit. More precisely, the average ground state 
energy {Emin{n)) = /q Pn{x)dx has the asymptotic be- 
haviors fsli 
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Vmin{p)n P > 1/2, 

(ln2)-ilnlnn p=l/2, (7) 
finite p < 1/2, 



where Vminip) — '^(•^*) with v{X*) being the maximum 
of the dispersion spectrum in Eq. (j^). Taking into ac- 
count the correction term as in the KPP equation, we 
found that for p > 1/2, the average ground state energy 
is given by M 



{Emin{n)) ~ Umin(p) n + — In n - 
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In a similar way, one can compute the average maximum 
energy. {E^^^{n)) ~ v^^^{p)n. Interestingly, the veloci- 
ties satisfy a duality relation Wmin(p) + 'ymax(l — p) = 1- 

(ii) For the unbounded one sided distribution p(e) — 
e~'^9{e), the dispersion spectrum becomes 



viX) 



■In 



A + 1 



(9) 



The average ground state energy is given by Eq. with 
A* = 3.31107 ... and v{X*) = 0.23196 . . .. 

We have also studied the asymptotic behaviors of the 
full distribution of the ground state energy for various 
p{e) and found ||l^ that the hierarchical correlations be- 
tween the random variables [Ei, E2, ■ ■ ■ , i?2"] violate the 
well known Gumbel type behaviors exhibited by the 
the distribution of the extreme of a set of independent 
random variables. 

Random Binary Search Tree: We now outline an ap- 
plication of the traveling front techniques to analysis of 
search algorithms in computer science [ p^ . A computer 
is constantly fed with enormous amount of data. It is 
therefore essential to organize or sort the data in an effi- 
cient way so that the computer spends the minimum time 
to search for a data if required later. There are various 
'sorting' and 'search' algorithms devised for this purpose 
. One particular algorithm that has been widely stud- 
ied by computer scientists is the Random Binary Search 
Tree (REST) algorithm To understand this algo- 

rithm, consider a simple example. Suppose the incom- 
ing data string consists of the twelve months of the year 
appearing in the following random order: July, Septem- 
ber, December, May, April, February, January, October, 
November, March, June and August. 




FIG. 2. The binary search tree corresponding to the data 
string in the order: July, September, May, April, February, 
January, October, November, March, June, and August. The 
tree has size A'' = 12 and height Hn = 5. 



The REST algorithm stores this data on a binary tree 
in the following way. A chronological order (January, 
February,...) is first chosen. Now the first element of 
the input string (July) is put at the root of a tree (see 
Fig. 2). The next element of the string is September. 
One compares with the root element (July) and sees that 
September is bigger than July (in chronological order). 
So one assigns September to a daughter node of the root 
in the right branch. On the other hand, if the new ele- 
ment were less than the root, it would have gone to the 
daughter node of the left branch. Then the next element 
is December. We compare at the root (July) and de- 
cide that it has to go to the right, then we compare with 
the existing right daughter node (September) and decide 
that December has to go to the node which is the right 
daughter of September. The process continues till all the 
elements are assigned their nodes on the tree. For the 
particular data string in the above example, we get the 
unique tree shown in Fig. 2. If the incoming data (con- 
sisting of N elements) is completely random, then all A^! 
possible binary trees have equal probability to occur and 
are called random binary search trees. 

A quantity that is widely used to measure the effi- 
ciency of such an algorithm is the maximum search time 
required to find an element. This is the worst case sce- 
nario. The maximum search time is quantified by the 
height Hn of a tree, i.e., the distance from the node to 
the farthest element on the tree. In the example of Fig. 2, 
Hn — 5. Thus determining the statistics of Hn is an ex- 
treme value problem. Apart from a slight modification, 
this problem can be mapped | p2[ onto the directed poly- 
mer problem described in the previous example. In the 
modified problem, the bond energies are not completely 
uncorrelated as in the usual polymer problem, but are 
correlated in a special way: The energies of two bonds 
emanating from the same parent (see Fig. 1) satisfy the 
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constraint, e~^ + e~^ = 1 and this constraint holds at ev- 
ery level of the tree. For example, in Fig. 1, we also have 
e~'=i+e"'2 = 1, e"''i+e"'2 = 1, etc. The statistics of the 
height variable if at in RBST problem was shown to be 
identical to the ground state energy E^ainin) of this mod- 
ified polymer problem with the identification N = 2". 
The traveling front analysis then gives the average height 

@ 

(Hn) =i ao In N + ai In In N. (10) 

Here ao = l/v{X*) and ai — —3/2X*v{X*) where A* = 
3.31107. . . is the maximum of the dispersion curve (|^) 
with v{X*) — 0.23196.... The traveling fronts retains 
its shape asymptotically, i.e., its width remains finite. 
This shows that the variance of the height Hn remains 
finite even in the large A'' limit. While some of these re- 
sults mentioned here were also derived by the computer 
scientists using rigorous mathematical bounds [p6|-p8[, 
the existence of a traveling front was not realized be- 
fore. Besides, the traveling front techniques allow us to 
derive more detailed results such as the asymptotic be- 
haviors of the full probability distribution of Hn (not 
just its moments) and also obtain completely new results 
for trees generated with arbitrary distributions, not nec- 
essarily uniform . The statistics of other observables 
such as the 'balanced' height of a tree (which corresponds 
to the maximum energy in the modified directed polymer 
problem) has also been derived exactly using the travel- 
ing front techniques . 

Aggregation Dynamics of Growing Random 
Trees: In the RBST example discussed above, the trees 
have fixed size N . Alternately one can consider random 
trees generated dynamically via an aggregation mecha- 
nism where the size of the trees grows indefinitely with 
time t. Apart from the computer science problems dis- 
cussed above, such growing trees also arise in physical 
situations such as collision processes in gases ||2^ where 
the largest Lyapunov exponent is related to the maxi- 
mum height problem. 

We studied a simple tree generation model jl^ where 
initially we have an infinite number of trivial (single-leaf) 
trees. Then, two trees are picked at random and attached 
to a common root. This merging process is repeated in- 
definitely (to simplify formulas the rate is set equal to 2). 
Let c(t) be the number density of trees at time t. Initially, 
c(0) = 1, and then it evolves according to dc/dt = —(? 
whose solution is c(t) = (1 + t)~^. Thence the average 
number of leaves per tree (A^) grows linearly with time, 
(N) = = 1 + t. We are interested in the minimal and 
maximal heights of such a growing tree. Rather than 
considering the two extremal height distributions sepa- 
rately, we studied a more general model that interpolates 
between the two cases. In this model, each tree carries 
an extremal height k. The result of a merger between 
trees with extremal heights fci and /c2 is a new tree with 



extremal height k given by 

^ _ f min(fci, A;2) + 1 with prob. p, 

1 max(/ci, k2) + 1 with prob. 1 ~ p . 

Here, p is a mixing parameter whose limits p = 1 and 
p — correspond to the minimal and the maximal heights 
problems, respectively. 

The number density of trees with extremal height fc, 
c/c (t) , evolves according to the master equation 

^ oo k — 2 

— = cl_i- 2ccfc + 2pcfc_iy^ Cj +2(l-p)efc-iy^ cj. 

j=k j=0 

(12) 

Introducing the cumulative fractions = c^^^°^j,Cj 

and a new time variable T — dr c{t) — ln(l + t), we 
recast Eqs. ( p^ into 

d Ai 

^ = -Ak + 2(1 - p)A,_i + [2p - l)Al_^, (13) 

which should be solved subject to the step function initial 
conditions, Afc(O) = 1 for fc < and Ak{0) = otherwise. 

In the long time limit, Ai^[T) approaches a traveling 
wave form, Ak{T) — > f{k — vT). The velocity v can 
be determined as in the above problems. Knowing 
this velocity one can compute the expected extremal tree 
height from the relation, (fc) — J2k ^'^k- Let us note 
one interesting fact. For p = (the maximal height case), 
we can express (fc) as a function of (N) (after eliminating 
t using (N) = 1 + 1) for large (N) and find 

(fc) = aoln(A^) +ai Inln(iV), (14) 

where ao and ai identical to those in Eq. ( |l0| ) for the 
fixed N trees. This shows that the dynamically grow- 
ing trees have the same asymptotic properties as those 
of the fixed size trees for large trees if one replaces N in 
Eq. (|l^) by (N) . This dynamic approach is thus similar 
to the grand canonical approach in statistical mechanics 
with the time t playing the role of the chemical potential 
that can be chosen to fix the average size. 

The Dynamics of Efficiency in a Simple Model: 

We now move from trees to economy and consider a 
model that mimics the dynamics of efficiencies of com- 
peting agents ||]. We represent the efficiency of each 
agent by a single nonnegative number. The efficiency of 
every agent can, independent of other agents, increase 
or decrease stochastically by a certain amount which we 
set equal to unity. In addition, the agents interact with 
each other which is the fundamental driving mechanism 
for economy. We assume that the interaction equates the 
efficiencies of underachievers to the efficiencies of better 
performing agents and set the rate of this process equal 
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to one; we denote the rates of the increase and decrease 
of the efficiency by p and q respectively. 

Let hi{t) is the efficiency of the agent i at time t and 
P{h, t) is the fraction of agents with efficiency h at time t. 
The evolution equation for P{h, t) is obtained by count- 
ing all possible gain and loss terms. For ft. > f, this 
equation reads |9| 



^^^ = -P{h,t) £ Pih',t)~{p + q)P(h,t) 

h' = h+l 

+ qP{h+l,t) +pP{h-l,t) 
h-l 

+ P(h,t)Y,P{h',t). (15) 

/i'=0 

The cumulative distribution F{h,t) = J2h'>h ^i^' 
satisfies 



dF{h,t) 
Jt 



= -F\h,t) + (l-p-q)F{h,t) 

+ qF{h + l,t)+pF{h-l,t). (16) 



The function F{h, t) approaches a traveling wave form, 
F{h,t) = f{h — vt). The velocity can be determined ||] 
by repeating the steps detailed in the previous examples. 
The analysis is more cumbersome due to appearance of 
the critical line pdq) in the (p, q) plane, 



Pc{q) 



for (7 > 1, 
for g < 1, 



(17) 



separating different behaviors. For p > Pc{q)^ the system 
is in the developing phase with the average efficiency {h) 
increasing according to 



{h) 



2X' 



lnt + 0(l). 



(18) 



Here again A* is the decay rate, f{x) (x e~ ^ as x ^ 00. 
For p < Pc{q) with g > 1, the system is localized and 
{h) approaches a time-independent constant in the long 
time limit. For p = and g < 1, the system is in the 
developing phase for unbounded initial efficiency distri- 
butions, with the growth rate dependent on initial con- 
ditions. For economically relevant compact initial condi- 
tions, the regime p = Q and q < 1 belongs to the stagnant 
phase. 

To summarize, we have exemplified that the asymp- 
totic statistics of extreme values can often be analytically 
determined using the powerful traveling front techniques. 
There might be a deeper hidden connection between the 
extreme value statistics and traveling fronts, and estab- 
lishing such a connection is a challenging task. Finally 
we mention that the recently computed [ pO[ correction 
to the the celebrated Bekenstein-Hawking (BH) formula 
for the black hole entropy 5'bh = An/Al'^^ (here Ah is 
the classical horizon area and /pi is the Planck length) 



is logarithmic. Specifically, for the four-dimensional non- 
rotating black hole, the entropy expansion |30[ 



S = 5'bh — 2 "^BH 



(19) 



strikingly resembles the time dependence of the front po- 
sition, Eq. (^). The elogarithmic correction term also 
appears for other black holes [|T]. It is therefore tempt- 
ing to speculate a traveling front structure in the black 
hole entropy. This speculation gets further strengthened 
by the fact that the BH entropy is the 'maximal' entropy 
that a black hole can have . In other words this may 
be considered as an extreme value problem. The verifica- 
tion of the existence, if any, of a traveling front structure 
in the black hole problem remains an outstanding open 
problem. 

We thank E. Bcn-Naim and D. S. Dean for collabora- 
tions on some of these topics. We also thank J. K. Bhat- 
tacharjee for pointing out the possible connection to the 
black hole entropy problem. 
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